Selecting Samples and Features for SVM Based on Neighborhood Model
نویسندگان
چکیده
Support vector machine (SVM) is a class of popular learning algorithms for good generalization. However, it is time-consuming in training SVM with a large set of samples. How to improve learning efficiency is one of the most important research tasks. It is known although there are many candidate training samples in learning tasks only the samples near decision boundary have influence on classification hyperplane. Finding these samples and training SVM with them may greatly decrease time and space complexity in training. Based on the observation, we introduce neighborhood based rough set model to search boundary samples. With the model, we divide a sample space into two subsets: positive region and boundary samples. What’s more, we also partition the features into several subsets: strongly relevant features, weakly relevant and indispensable features, weakly relevant and superfluous features and irrelevant features. We train SVM with the boundary samples in the relevant and indispensable feature subspaces, therefore simultaneous feature and sample selection is conducted with the proposed model. Some experiments are performed to test the proposed method. The results show that the model can select very few features and samples for training; and the classification performances are kept or improved.
منابع مشابه
Optimal Feature Extraction for Discriminating Raman Spectra of Different Skin Samples using Statistical Methods and Genetic Algorithm
Introduction: Raman spectroscopy, that is a spectroscopic technique based on inelastic scattering of monochromatic light, can provide valuable information about molecular vibrations, so using this technique we can study molecular changes in a sample. Material and Methods: In this research, 153 Raman spectra obtained from normal and dried skin samples. Baseline and electrical noise were eliminat...
متن کاملDeveloping a New Method in Object Based Classification to Updating Large Scale Maps with Emphasis on Building Feature
According to the cities expansion, updating urban maps for urban planning is important and its effectiveness is depend on the information extraction / change detection accuracy. Information extraction methods are divided into two groups, including Pixel-Based (PB) and Object-Based (OB). OB analysis has overcome the limitations of PB analysis (producing salt-pepper results and features with hole...
متن کاملThe Role of Open Spaces in Neighborhood Attachment (Case Study: Ekbatan Town in Tehran Metropolis)
This paper intends to highlight one of the key concerns of housing designers i.e. the influence of open space on human satisfaction as well as their attachment to a particular place. Consequently, neighborhood open spaces with their meaningful layout seem to be constituent elements of residential complexes. Although, architects and designers try to create qualitative areas among buildings, the ...
متن کاملImproving the Operation of Text Categorization Systems with Selecting Proper Features Based on PSO-LA
With the explosive growth in amount of information, it is highly required to utilize tools and methods in order to search, filter and manage resources. One of the major problems in text classification relates to the high dimensional feature spaces. Therefore, the main goal of text classification is to reduce the dimensionality of features space. There are many feature selection methods. However...
متن کاملFeature Extraction and Efficiency Comparison Using Dimension Reduction Methods in Sentiment Analysis Context
Nowadays, users can share their ideas and opinions with widespread access to the Internet and especially social networks. On the other hand, the analysis of people's feelings and ideas can play a significant role in the decision making of organizations and producers. Hence, sentiment analysis or opinion mining is an important field in natural language processing. One of the most common ways to ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2007